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Abstract 

The sphere-packing bound E sp (R) bounds the reliability function for fixed-length block-codes. For symmetric 
channels, it remains a valid bound even when strictly causal noiseless feedback is allowed from the decoder to the 
encoder. To beat the bound, the problem must be changed. While it has long been known that variable-length block 
codes can do better when trading-off error probability with expected block-length, this correspondence shows that 
the fixed-delay setting also presents such an opportunity for generic channels. 

While E sp (R) continues to bound the tradeoff between bit error and fixed end-to-end latency for symmetric 
channels used without feedback, a new bound called the "focusing bound" gives the limits on what can be done with 
feedback. If low-rate reliable flow-control is free (ie. the noisy channel has strictly positive zero-error capacity), 
then the focusing bound can be asymptotically achieved. Even when the channel has no zero-error capacity, it is 
possible to substantially beat the sphere-packing bound by synthesizing an appropriately reliable channel to carry 
the flow-control information. 
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I. Introduction 

The two most fundamental parameters when it comes to reliable data transport are end-to-end system 
delay and the probability of error. Error probability is fundamental because a low probability of bit 
error lies at the heart of the digital revolution justified by the source/channel separation theorem. Delay 
is important because it is the most basic cost that a system must pay in exchange for reliability — it 
allows the laws of large numbers to be harnessed to smooth out the variability introduced by random 
communication channels. 

Traditionally, block-length has been used as a proxy for end-to-end delay since block-codes are easier 
to understand than non-block codes. Even when fixed end-to-end delay is desired, this paper shows that 
nonblock codes can provide a tremendous advantage when feedback is allowed. This short correspondence 
is a companion to our longer work in [1]. Some key results are reviewed here in the next section, but the 
reader is referred to [1] for more details, motivation, as well as a perspective on the existing results in the 
literature. The new contribution in this correspondence comes in Section |in| It shows how to construct 
a special fixed-delay code over a DMC using noiseless feedback. It beats the sphere-packing bound with 
fixed-delay in the high-rate regime even for channels (like the BSC) that have no zero-error capacity. 
A plot is given for the BSC-0.4 case that provides an explicit counterexample to Pinsker's assertion 
(Theorem 8 in [2]) that this is impossible to do. 

Simsek had earlier built codes for the BSC in [3], [4] which beat the sphere-packing bound with fixed 
delay. Those were fundamentally built upon the equivalence between scalar stabilization problems and 
feedback communication problems established in [5], [6], but were hard to analyze. They also did not 
work at high rates. The advantage of the codes given here is their conceptual simplicity and the fact that 
they beat the sphere-packing bound in the high-rate regime. These codes do not do well in the low-rate 
regime and it is clear that they could be married with Simsek-codes to give some improvements at low-rate. 
However, this would not be enough to reach the focusing bound so there is much room for improvement. 

II. Review 

A. Block coding 

The fundamental lower-bound on error probability comes from the sphere-packing or volume bound, 
and this bound is also known to be achievable at high rates by random-coding [7]. Reliable communication 
is not possible if during the block, the channel acts like one whose capacity is less than the target rate. 
Following [8] and [9], for block codes this idea immediately gives the following bound on the exponential 
error probability: 

E + (R) = inf maxD(G\\P\r) (1) 

G:C(G)<R f 

where D{G\\P\r) is the divergence term that governs the exponentially small probability of the true channel 
P behaving like channel G when facing the input distribution coming from the codeword composition r. 

Even with causal noiseless feedback, there is no way around this bound because channel capacity does 
not increase with feedback for memory less channels. Without feedback, the bound can be tightened to 
the form traditionally known as the sphere-packing bound. 

E sp (R)=max min D (G\\P\r) (2) 

f G:I{r,G)<R 

For symmetric channels, the optimizing codeword composition r is always uniform and E sp (R) = E + (R). 
Thus, for fixed-block codes and symmetric DMCs, no only does causal feedback not improve capacity, 
but it does not improve reliability either [10]! 
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An alternate form for E sp (R) is given by 
with the Gallager function E (p) defined as: 



E sp (R) = max [E (p) - P R] (3) 



i 

E (p) = max - In ^ ^ q x px% p 
y x 

Note that for symmetric channels, it suffices to use a uniform q while optimizing ©. Also, since the 
random-coding error exponent is given by: 

E r {R) = max \E {p) - pR] (5) 

0<p<l 

It is clear that the sphere-packing bound is achievable, even without feedback, at rates close to C since 
for those rates, p < 1 optimizes both expressions [7]. The points on the sphere-packing bound where 
p > 1 are also achievable by random coding if the sense of "correct decoding" is slightly relaxed. Rather 
than forcing the decoder to emit a single estimated codeword, list-decoding allows the decoder to emit 
a list of guessed codewords. The decoding is considered correct if the true codeword is on the list. For 
list-decoding with list size I in the context of random codes, Problem 5.20 in [7] reveals that 

E r>e (R) = max [E (p) - pR] (6) 

is achievable. At high rates (where the maximizing p is small), there is no benefit from relaxing to 
list-decoding, but it makes a difference at low rates. 

B. Non-block codes 

Another classical approach to the problem of reliable communication is to consider codes without a 
block structure. Convolutional and tree codes represent the prototypical examples. It was realized early 
on that in an infinite 1 constraint length convolutional code under ML decoding, all bits will eventually 
be decoded correctly [7]. However, if the end-to-end delay is forced to be bounded, then the bit error 
probability with delay is governed by E r (R) for random convolutional codes, even when the constraint 
lengths are unbounded [11]. This performance with delay is also achievable using an appropriately biased 
sequential decoder [12]. A nice feature of sequential decoders is that they are not tuned to any target 
delay — they can be prompted for estimates at any time and they will give the best estimate that they 
have. Thus an infinite constraint-length convolutional code with appropriate sequential decoding achieves 
the exponent E r (R) delay universally over all (sufficiently long) delays. 

Pinsker claimed in [2] that the sphere-packing bound continued to bound the performance of nonblock 
codes both with and without feedback. He had proofs for the BSC case, but asserted that the result held 
more generally. While he was right for the without feedback case, it turns out that there is a subtle flaw 
in his argument regarding the case with feedback. 

1) The BEC example: This example, repeated from [1] for the reviewer's convenience, shows the power 
of feedback in the delay context. The binary erasure channel with erasure probability 5 < | used at bit-rate 
R' — \ gives a counterexample to Pinsker's conjecture. The BEC is so simple that everything can be 
understood with a minimum of overhead. 

For S = 0.4, this corresponds to an error exponent of about 0.02. Even with feedback, there is no way for 
a fixed block-length code to beat this exponent. If the channel lets fewer than | bits through the channel, 
it is impossible to reliably communicate an ~ bit message! 



(i+p) 

(4) 



More precisely, these are unbounded constraint length codes since at any finite time there are only a finite number of data bits so far. 
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Fig. 1. The birth-death Markov chain governing the rate | feedback communication system over an erasure channel. 

If causal noiseless feedback is available, the natural nonblock code just retransmits a bit until it is 
correctly received. As bits arrive steadily at the rate R' — |, they enter a FIFO queue of bits awaiting 
transmission. If we look at the queue state every two channel uses, it can be modeled (see Figure [TJ as 
a birth-death Markov chain with a 5 2 probability of birth and a (1 — 5) 2 probability of death. Converting 
that into an error exponent with delay d gives: 

Ef c { 1 -) = ln(l - 5) - \n{5) (8) 

Plugging in 5 = 0.4 gives an exponent of more than 0.40. This is about twenty times higher than the 
sphere-packing bound! 



C. The focusing bound 

Restricting attention to symmetric channels, the BEC case can be abstracted to get a general bound 
on the probability of error with delay. [1] calls this bound the "focusing bound" because it is based on 
the idea of having the encoder focus as much of the decoder's uncertainty as possible onto bits whose 
deadlines are not pending. 

Definition 2.1: A rate R encoder with noiseless feedback is a sequence of maps St. The range of each 

I D/i| 

map is the discrete set X. The t-th map takes as input the available data bits B[ , as well as all the 
past channel outputs Y*~ . 

Randomized encoders with noiseless feedback also have access to a continuous uniform random variable 
Wt denoting the common randomness available in the system. 

Definition 2.2: A delay d rate R decoder is a sequence of maps Dj. The range of each map is just an 
estimate Bi for the i-th bit taken from {0, 1}. The 2-th map takes as input the available channel outputs 

which means that it can see d time units beyond when the bit to be estimated first had a chance 
to impact the channel inputs. 

Randomized decoders also have access to all the continuous uniform random variables W t . 

Definition 2.3: The fixed-delay error exponent a is asymptotically achievable at rate R across a noisy 
channel if for every delay dj in some increasing sequence dj — > oo there exist rate R encoders and delay 
dj decoders S dj , V dj that satisfy the following properties when used with input bits Bi drawn from iid 
fair coin tosses. 

1) For every j, there exists an €j < 1 so that P(B,i ^ B^dj)) < €j for every i > 1. The Bi(dj) 
represents the delay dj estimate of Bi produced by the S j ,V j pair connected to the input B and 
the channel in question. 

2) Hindoo < a 

The exponent a is asymptotically achievable universally over delay or in an anytime fashion if a single 
encoder S can be used above for all dj above. 

Theorem 2.4: Focusing bound from [1]: For a discrete memoryless channel, no delay exponent a > 
E a (R) is asymptotically achievable even if the encoders are allowed access to noiseless feedback. 

EAR) = inf ^> (9) 

0<A<1 1 — A 
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where E + is the Haroutunian exponent from ([T]). When the DMC is symmetric, E a (R) can be expressed 
parametrically as: 

E a (R) = E ( V ) ; R = (10) 
where E (r]) is the Gallager function from ©, and r] ranges from to oo. 



D. The (n, c, /) family of codes 

The focusing bound is attained for the BEC with feedback using the natural "repeat bits until successful" 
code. As demonstrated in [1], it can also be asymptotically attained for any noisy channel provided we 
have access to a low-rate channel that can deliver perfectly noiseless flow-control bits from the encoder 
to the decoder. The code is reviewed below. 

Call c > 1 the chunk length, 2 l the list length, and n > I the data block length. The (n, c, I) scheme is: 

• Queue up incoming bits and assemble them into blocks of size bits. If there are fewer than y 2 ^ 
bits still awaiting transmission, just idle by transmitting an arbitrary input letter. 

• At every noisy channel use, the encoder sends the channel input corresponding to the next position 
in an infinite-length random codeword associated with the current data block, where the random 
codewords are drawn iid using the appropriate input distribution 2 over the noisy channel's input 
alphabet. 

• If the time is an integer multiple of c, use the noiselessly fedback channel outputs to simulate 
the decoder's attempt to decode the current codeword to within a list of the top 2 l items. If the 
true data-block is one of the 2 l items, send a 1 over the noiseless flow-control link. Also send the 
disambiguating I bits representing the true block's index within the decoder's list. Remove the current 
block of bits from the main data queue as well. If the true block is not in the decoder's list, just 
send a over the noiseless flow-control link. 

• At the decoder, the encoder queue length is known perfectly since it can only change by the 
deterministic arrival of data bits or when a noise-free confirm or deny bit has been sent over the 
flow-control link. Thus the decoder always knows which input block a given channel output Y t or 
fortified symbol S t corresponds to. 

• If the time is an integer multiple of c and the decoder receives a 1 noiselessly, then it decodes what it 
has seen to a list of the top 2 l possibilities for this block. It will use the next / noisefree flow-control 
bits to disambiguate this list and will use the result as its estimate for the block. 

Such schemes are shown in [1] to be asymptotically optimal: 

Theorem 2.5: By appropriate choice of (n, c, I), it is possible to asymptotically achieve all delay 
exponents a < E (p) for R = for the fortified system built around a DMC by adding a rate | 

noisefree forward flow-control link where k can be made as small as desired. 



III. Synthesizing a pathway to carry flow-control information 

These codes use time-sharing of the channel to split it into two parts. One part carries the data and the 
other part carries flow control information. 

A. Channels with positive zero-error capacity 

The fortified communication scheme is easily adapted to channels with strictly positive zero-error 
capacity by just using the feedback zero-error capacity to carry the flow-control information [1]. There is 
no k. Instead, let 9 be block-length required to realize feedback zero-error transmission of at least I + 1 
bits. As illustrated in Figure El terminate each chunk with a length 9 feedback zero-error code and use it 

2 Use the Eq(t]) maximizing input distribution for the r\ such that the data rate R = . 
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Fig. 2. One block's transmission in the (n,c,l,9) code for noisy channels. The 9 is used to carry I + 1 flow-control bits reliably. If the 
channel has a strictly positive feedback zero-error capacity, 8 does not scale with c. If it does not, 6 is proportional to c. 

to transmit the flow-control information. If the chunk size is c, then it is as though we are operating with 
only a fraction (1 — -) of the channel uses. The overhead tends to zero by making the chunk sizes long 
giving the following corollary to Theorem 12.51 

Corollary 3.1: By appropriate choice of (n,c,l), it is possible to asymptotically achieve all delay 
exponents a < E (p) for R = for any channel with C j > 0. 

B. Channels without zero-error capacity 

When the channel has no zero error capacity, then we can still allocate 9 channel uses per chunk to 
carry flow control information and have the encoder just assume that it was received correctly. This can 
be done by using an infinite constraint-length time-varying random convolutional code. 3 This gives a 
delay-universal scheme that is guaranteed to eventually get the flow-control information across correctly. 
Unlike a zero-error code, all that such a code can guarantee is that the probability of error in the entire 
message stream prefix is exponentially small in the number of channel uses that have occurred in the 
code since that message stream prefix was determined. 

The flow-control information can be viewed as low-rate "punctuation" that tells the decoder how to 
parse the channel outputs that are carrying the data itself. Essentially, the punctuation gives "commas" 
that separate out the different message blocks 4 . Here, we assume that the decoder uses its current best 
estimate of the punctuation to re-parse the history of the data-carrying stream. Then the data-carrying 
channel outputs are decoded assuming that the flow-control information is correct. Any bits that have 
reached their deadlines are emitted, but this does not prevent the decoder from re-parsing them in the 
future. 

Consequently, an error can occur at the decoder in two different ways. As before, the data-carrying 
stream could be corrupted due to channel atypicality in those slots. However, the flow-control stream 
could also become corrupted. As a result, the 9 must be kept proportional to the chunk length c to 
avoid having the flow-control messages cause too many errors. The effective rate of the flow control 
information therefore goes to zero as c — > oo and the relevant error exponent is about E (l). Balancing 
the error probabilities and optimizing over the choice of 9 gives the following theorem: 

Theorem 3.1: By appropriate choice of (n,c,l,9), it is possible to asymptotically achieve all delay 

3 See [1] to see some tricks that allow such a code to be operated with feedback at bounded expected computational complexity, at least 
at low rates. 

4 As well as disambiguating any lists. 
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exponents a < E'(R) where the tradeoff curve is given parametrically by varying p £ (0, oo): 

E'(P) = (-e^v + tfW) 1 (ID 



MP) ^o(l). 

R(P) = *M (12) 
p 

Proof: For simplicity of exposition, we assume that the block length n, chunk size c, and list size / are 
large enough that the code essentially achieves the focusing bound for whatever the effective rate is. The 
various e terms are ignored. 

Let ip be the proportion of channel uses dedicated as overhead to run the low-rate flow-control channel. 
So the effective chunk size in the data-stream is c' = c(l — ip). The effective rate of the message stream 
is thereby increased to Assuming that the flow control information is correct, the delay-universal 
error exponent is thus E a (-^) with respect to the delay in terms of code channel uses. But there are 
only (1 — ip) code channel uses per unit of actual time and so the delay exponent is (1 — ip)E a (j^) with 
respect to true delay. 

Meanwhile 9 = cip. The effective flow-control information rate is ^ ~ since c can be made as 
large as we want. Since this code achieves the random-coding error exponent, the delay-universal error 
exponent for the flow-control stream is essentially E (l) with flow-code-channel uses since that is the 
zero-rate point for random coding. But there are only ip flow-code-channel uses per actual time and so 
the delay-exponent for the flow-control stream is actually ipE (l) with respect to true delay. 

Pick a fixed-delay d large. It can be written as d = df + d m in d different ways. Let df be the part of the 
delay that is "burned" by the flow-control stream. Thus, with probability exponentially small in df, this 
suffix of time has possibly incorrect flow-control information and so can not be trusted to be interpreted 
correctly. Thus, the performance of the code with delay d is like the performance of the underlying (n, c', I) 
code with delay d m . Since the channel uses are disjoint, the two error events are independent and thus 
the achieved exponent is the weighted average of the two error exponents. Balancing the exponents of 
the two parts tells us to set: 

E> = iPE Q {l) = {l-iP)E a {^—) 

with the resulting error probability with delay being governed by « <iexp(— dipE (l)). The polynomial 
term d in front is dominated entirely by the exponential decay and can be ignored. 
Using the parametric forms using p for E a , we get a pair of equations: 

^o(l) = (l-^(p) (13) 

m = « (14 > 

p i -w 

The first thing to notice is that simple substitution gives 

R _ (l- ^)Eo(p) _ jjEpjl) _ E> 
P P P 

Solving for ip shows (after a little algebra) that 

This way 1 — ip = E rf?^ ? \ and the first equation is clearly true. Similarly -^-r = 1 + -§^44 and 



i> = E (p) 

1— V E (l) 



E (l)+E (p) — " — V — i-V, - - i Eo (l) 

and thus the second equation is also true. 
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Fig. 3. The reliability functions for the binary symmetric channel with crossover probability S = 0.4. The sphere-packing bound approaches 
capacity quadratically flat while the focusing bound and the new scheme both approach the capacity point linearly. 



E> = V^o(l) 

E (p)E ( 



E (l) + E (p) 
1 1 



-i 



E (p) E (l] 

Which establishes the theorem. □ 
The superiority of these exponents to the sphere-packing bound in the high rate regime is immediately 
clear since they are basically like the focusing bound in form. Some algebra and simple calculus reveals 
that the focusing bound 5 has slope 2(7/ 9 in the vicinity of the (C, 0) point, while the E'(R) curve 
achieved by Theorem EO has the lower slope E (1)/(C - ^£1 mther 

way, in generic cases, 

the reliability drops linearly in the neighborhood of capacity rather than in a quadratically flat manner. 
Figure |3] illustrates the bounds for a BSC with crossover probability 0.4. 



IV. Conclusions 

Even when there is no zero error capacity, flow-control can be used to substantially beat the sphere- 
packing bound with respect to delay at high rates. The arguments from [1] dealing with fixed-delay 
feedback also apply to the new code and show that the reliabilities achieved here are still asymptoti- 
cally achievable even if the feedback is delayed. The key is that our flow-control code does not need 
instantaneous feedback to achieve its internal reliability target « E (l). 

5 When 8 g°i = 0, page 143 in [7] reveals that the Sphere-packing bound is a straight line hitting zero at capacity. In such cases, the 
focusing-bound is bounded away from zero even in the neighborhood of capacity and hence this curve has an infinite slope. 
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We conjecture that the gap between the focusing bound and the reliabilities achieved by our scheme 
in the no-zero-error case is due to our a-priori splitting of the channel into dedicated data and flow- 
control links. The parallel channel coding advantage tells us that splitting a channel generally results in 
a loss of reliability. The codes in [3] performed much better in the low-rate regime because they had the 
flow-control information implicitly within the message stream itself. 
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